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Abstract 

Promoting cooperation is an intellectual challenge in the social sciences, for which the iterated Prisoners’ 
Dilemma (IPD) is a fundamental framework. The traditional view that there exists no simple ultimatum 
strategy whereby one player can unilaterally control the share of the surplus has been challenged by a 
new class of zero-determinant (ZD) strategies raised by Press and Dyson. In particular, the extortionate 
strategies can subdue the opponent and obtain higher scores. However, no empirical evidence has yet been 
found to support this theoretical finding. In a long-run laboratory experiment of the iterated Prisoners’ 
Dilemma pairing each human subject with a computer co-player, we demonstrate that the extortionate 
strategy indeed outperforms the generous strategy against human subjects. Our results show that the 
extortionate strategy achieves higher scores than the generous strategy, the extortionate strategy promotes 
the cooperation rate to a similar level as the generous strategy does, and the human subjects’ cooperation 
rates in both the extortionate and generous treatments are increasing over time. While our results imply 
that the human subjects cared about their earnings as well as fairness or reciprocity, we do observe that 
subjects learned to become increasingly cooperative over time to increase their own monetary payoffs. 
Our experiments provide the first laboratory evidence in support of the Press-Dyson theory. 
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Significance Statement 

(1) The ZD strategies successfully unilaterally enforced a linear relationship between human players’ 
scores and the scores of the computer players playing the ZD strategies, as predicted by Press and 
Dyson. (2) The extortionate ZD strategists obtained significantly higher scores than the generous ZD 
strategists. Half of the extortionate ZD strategists even got scores higher than those resulting from 
mutual cooperation. (3) The human subjects’ cooperation rate against the extortionate ZD strategy 
is as high as that against the generous ZD strategy. Moreover, the human subjects’ cooperation rates 
increase over time in both treatments. (4) Although human subjects display some degree of conditionally 
cooperative behaviors, most of them are evolutionary and gradually accede to the extortion in the long 
run. Consequently, the extortionate ZD strategy outperforms the generous ZD strategy in the long run. 
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1 Introduction 

Promoting cooperation under adverse short-term individual incentives is an important social challenge, 
and the iterated Prisoners’ Dilemma (IPD) has been widely studied as the game theoretic framework 
representing this issue HH3- In a one-shot two-person prisoners’ dilemma, there are two pure strategies: 
cooperate (C) and defect (D). Each player receives R if they mutually cooperate; each player receives P 
if they mutually defect; if one player cooperates and the other defects, the defector receives T and the 
cooperator receives S, where T > R > P > S guarantees that in this game the commonly used solution 
concept Nash equilibrium, is mutual defection, while 2 R > T + S implies that mutual cooperation is 
actually the socially best outcome. 

Since the computerized tournaments conducted by Axelrod w, kindness and fairness appeared to 
yield the best chance to promote and sustain cooperation. A substantial number of studies suggest 
that reciprocity makes mutual cooperation feasible [SI [DTI , and is a favorable strategy in an evolutionary 
setting mm- 

Surprisingly, Press and Dyson recently discovered a class of so-called zero-determinant (ZD) strategies 
which allow a player to unilaterally enforce a linear relationship between his score and that of his opponent 
m- In particular, a subclass of ZD strategies, namely the extortionate strategies, have the potential to 
guarantee that the extortioner’s own surplus exceeds the opponent’s surplus by a fixed percentage, by 
making it optimal for the opponent to cooperate. This means that it is possible to maintain cooperation 
and pursue self-interest at same time. In addition, an extortioner can earn a score exceeding that from 
constant mutual cooperation. This new finding by Press and Dyson has also stimulated many researchers 
to further investigate the performance of ZD strategies in various situations [T9H25] . The key insight 
of the Press-Dyson theory remains that, if a human being’s behavior is developed in an evolutionary 
manner, he will tend to become more cooperative over time, and if so, the ZD strategist will achieve her 
maximum possible score by exploiting this cooperative tendency of her opponent [18j . 

In spite of these significant developments in the literature, the empirical verification of the theory is 
rather nontrivial. Up to now, there is only one published study which tests the Press-Dyson theory in a 
laboratory experiment environment. Hilbe et al [23] provided experimental evidence on performances of 
different ZD strategies played by computers against human subjects. They specified the ZD strategists to 
play the extortionate strategy or the generous strategy against human subjects in the context of IPD, in 
which the extortioner is predicted to earn a higher score based on the Press and Dyson’s theory. They find 
that extortion subdues human players, although generosity turns out to be the more profitable strategy, 
and furthermore, that the cooperation rate of human co-players against extortionate ZD strategies is 
only half of that against generous ZD strategies. In other words, generosity appears to be the winning 
strategy after all. The experimental results by Hilbe et al [23] thus appear to refute the Press-Dyson 
theory, and has inspired other studies |26j . This apparent inconsistency between theory and experimental 
results calls for a closer examination of the discrepancy un¬ 
it is worth noting that in the experiment by Hilbe et al [23], it was not made known to the human 
subjects that they were actually playing against a strategy executed by a computer program. Due to this 
feature of their experimental design, it is arguable that effects of two factors, the ZD strategies themselves 
and the nature of the other player, cannot be well-distinguished in their results. In the real world, a 
strategy can be executed by a human individual, but can also be executed by a non-human mechanism. 
For example, a ZD strategist can be understood as an institution which has its own regularities in 
generosity or extortion when interacting with human beings. In the view of Darwinism, institutions are 
competitive and the winning institution (ZD strategist) will be selected based on the performance. To 
begin to test the theory cleanly, we should explore the performance of the strategy independently of 
potentially more complicated social influences such as human players’ attitudes and intentions towards 
other human players. In other words, which ZD strategy performs better when playing against human 
subjects who know that they are facing interaction with a “strategized” computer instead of a human 
being. 
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Another factor that might have led to the inconsistency between the Press-Dyson theory and the 
experimental result by Hilbe et al [23] , could be insufficient learning opportunities for the human subjects 
in their experimental design. As is known, ZD strategies are complicated. A ZD strategy is described by 
the probabilities of cooperation given the four possible outcomes of the previous round: p = (pi,P 2 ,P 3 ,Pa), 
where Pi,i & (1, 2,3,4) is the probability of cooperation given the previous outcome CC, CD, DC and DD, 
respectively. It is well understood that a strategy involving uncertainty (and stochastic actions) takes 
longer time to reveal itself than a pure strategy. The 60-round sessions implemented in the experiments 
by Hilbe et al ; 23] , may not be long enough for people to learn the details of a ZD strategy. Lengthier 
sessions which can accommodate learning are often more ideal when subjects are facing probabilistic 
environments [28H34] . 

In order to test the true performance of the ZD strategies, we conducted a laboratory experiment 
of iterated Prisoners’ Dilemma, which provided the human subjects with a rich learning experience over 
the course of 500 rounds, and made it clearly known to the subjects that they are playing against 
a strategized computer. Our results show that the extortionate ZD strategy indeed outperforms the 
generous ZD strategy. To the best of our knowledge, this is the first experimental evidence which supports 
the predictions of the Press-Dyson theory. 

2 Experiment and Results 

For comparison of performance of different ZD strategies, we designed a laboratory experiment, in which, 
every human player faced a fixed ZD strategy implemented by a computer program for 500 rounds. Every 
human player was informed that the opponent will be a computer program. 

The payoff matrix used in our experiment is shown in Fig'IH which is the same as in Refs [181123] , Both 
players receive 3 if they mutually cooperate, both players receive 1 if they mutually defect, the defector 
receives 5 and the cooperator receives 0 if one player cooperates and the other defects. 
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Figure 1. Payoff matrix. If both players cooperate (C), each player receives 3, if one cooperates and 
one defects (D), cooperator receives 0 and defecter receives 5, if both defect, each player receives 1. 

We employed two treatments, extortionate ZD strategy (ES) and generous ZD strategy (GS). Ta¬ 
ble Q] summarizes the experimental design. For ES, the four conditional cooperation probabilities are 
(pi,P 2 iP 3 ,Pa) = (0.692,0.000,0.538,0.000), and for GS the probabilities are (pi,P 2 ,P 3 ,P 4 ) = (1-000, 
0.182, 1.000, 0.364). In addition, ES defects in the first round and GS cooperates in the first round. 
These are the same as the strong extortionate ZD strategy and strong generous ZD strategy in Hilbe et 
al [23], respectively. 

Altogether 64 graduate and undergraduate students participated in the experiment with each treat¬ 
ment consisting of 32 participants. For further details on the implentation of the experiment, see the 

Materials and Methods section. 
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Table 1. Experimental design 


Treatment 

Number of human co-players 

Po 

Pi 

P2 

P3 

Pa 

slope (s) 

E 

32 

0.000 

0.692 

0.000 

0.538 

0.000 

1/3 

G 

32 

1.000 

1.000 

0.182 

1.000 

0.364 

1/3 


E, extortion; G, generosity. 

ZD strategies are defined by five probabilities: po, the probability to cooperate in the first round, pi, P 2 , P 3 , Pa 
the four conditional cooperation probabilities given the previous round’s outcome CC, CD, DC, and DD, 
respectively, from the ZD strategist’s view. Extortionate strategies do not cooperate in the first round, and they 
never cooperate after mutual defection. Generous strategies, on the other hand, cooperate in the first round and 
they always cooperate after mutual cooperation. The parameter s determines the slope of the predicted payoff 
relationship: a slope of s = 1/3 implies that for each additional point that the ZD strategist earns, the human 
co-player’s additional payoff is 1/3. 


Table 2. Summary of the experimental results 


Treatment number of player Scores pei round Cooperation rate 

Human ZD strategist Human ZD strategist 




mean 

s.d. 

mean 

s.d. 

mean 

s.d. 

mean 

s.d. 

ES 

32 

1.703 

0.200 

2.943 

0.511 

0.684 

0.198 

0.436 

0.137 

GS 

32 

2.746 

0.260 

2.263 

0.788 

0.645 

0.349 

0.741 

0.245 


2.1 Comparison of the score performance of two ZD strategies 

Table [5] exhibits the overview of the experimental results and Figure [2] shows the resulting average scores 
over all 500 rounds of the game for each of the two treatments. On average, the extortioners earn 
2.943 ± 0.511 (mean ± s.d.) scores per round and generous strategists earn 2.263 ± 0.788 (mean ± s.d.) 
scores per round. Extortionate strategists earn significantly higher scores than the generous counterparts 
(Mann-Whitney test, ue = = 32, 2 = 4.196, p < 0.000). The average scores for extortionate 

strategists are 30% higher than generous strategists. Furthermore, half of the extortioners even earned 
scores higher than 3 per round, which is the score associated with constant mutual cooperation. On the 
contrary, no generous strategist earned scores exceeding 3. This result matched the theoretical prediction 
of ZD strategies [18] well. For further details, see SI. 

2.2 Comparison of the effectiveness of promoting cooperation between two 
ZD strategies 

Table [3 also displays the overall cooperation rates. On average, the cooperation rate of human co-players 
is 0.684±0.198 (mean ± s.d.) in ES treatment and 0.645±0.349 (mean ± s.d.) in GE treatment. There 
is no significant difference between the two treatments (Mann-Whitney test, tie = tig = 32, 2 = 0.537, 
p = 0.591). 

In accordance with the similarity in cooperation rates, the two ZD strategies also generate similar 
dynamical patterns. Figure [3] shows the dynamical cooperation rate of human subjects for each treatment. 
The cooperation rate in each treatment starts from a relatively lower point then follows an increasing 
path. For the ES treatment, compared with the cooperation rate 0.563 ±0.221 (mean ± s.d.) in the first 
100 rounds, the cooperation rate 0.757±0.220 (mean ± s.d.) in the last 100 rounds is significantly higher 
(Wilcoxon signed-rank test, n = 32, 2 = 4.207, p < 0.000). For the GS treatment, compared with the 
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Figure 2. Average scores across the two treatments for human subjects and the ZD 
strategies. The scores for the extortionate strategy (red filled bar) are higher than the generous 
strategy (green filled bar). Three grey stars indicate significance at the level a = 0.001 (Mann-Whitney 
test, he = no = 32). In terms of average scores, while the extortioners stay ahead of their human 
co-players (red empty bar), the generous strategists are left behind their human opponents (green 
empty bar). The error bars indicate the standard errors. 


cooperation rate 0.513 ± 0.327 (mean ± s.d.) in the first 100 rounds, the cooperation rate 0.716 ± 0.386 
(mean ± s.d.) in the last 100 rounds is also significantly higher (Wilcoxon signed-rank test, n = 32, 
x = 3.314, p < 0.000). 

Our results suggest that the effectiveness of promoting cooperative behaviors is similar between the 
extortionate strategy and the generous strategy. These results strongly support the theoretical prediction 
of ZD strategies |18| . A theoretical explanation for the increasing trend in the cooperation rate is provided 
in SI. 

2.3 The score relationship between ZD strategies and human players 

Table m exhibits the average scores per round for both human subjects and computer programs, and 
Figure [2] illustrates the results. The extortioners earn higher scores than their human co-players (Wilcoxon 
matched-pairs signed-rank test, tie = Tihuman = 32, z = 4.937, p < 0.000). On the contrary, the generous 
ZD strategists earn lower scores than their human co-players (Wilcoxon matched-pairs signed-rank test, 
tig = Tihuman = 32, 2 = 4.910, p < 0.000). In addition, the relationship of the scores between ZD 
strategists and human co-players follows the linear relationship prediction, as shown in Figure [4] These 
results suggest that ZD strategies can indeed unilaterally enforce a linear relationship between human 
players’ scores and their own scores. These results are consistent with the theoretical prediction of the 
ZD strategies (for more details, see SI). 

3 Discussion 

Our long-run experiment showed that the extortionate strategy outperformed the generous strategy 
significantly. The extortionate strategy earned higher scores than the generous strategy while promoting 
the human co-player’s cooperative behavior at the same level of effectiveness. We also found that the 
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Figure 3. Human cooperation rates over the course of the game. The graph shows the 
fraction of cooperating human subjects for each round for ES treatment (left) and GS treatment (right). 
Dots represent the average human cooperation rate at a round within a treatment, with the shaded 
areas depicting the 95% confidence interval. The rising trends of cooperation behaviour appear in both 
treatments (Spearman’s rank correlation, for ES, n = 500, p = 0.556, p < 0.000; for GS, n =500, 
p = 0.757, p< 0.000). 


ZD strategies was successful in unilaterally enforcing a linear relationship between human players’ scores 
and their own scores. These findings confirm the ZD theory [lE] precisely. In particular, half of the 
extortionate strategists earned scores higher than 3 per round on average, indicating that the extortionate 
strategist can in fact earn higher payoffs compared to the case of constant mutual cooperation. 

Our results are quite different from the previous experimental investigation of Hilbe et al [23j . In their 
experiment, the generous strategies earned higher scores than the extortionate strategies, the human co¬ 
players’ cooperation rate in the extortion treatment is only half of that in the generosity treatment, 
and the rising trend in the cooperation rate appears only in the generosity treatments. Based on the 
differences between our results and theirs, we provided evidence to support the ZD theory while their 
results appeared to raise questions about the theory’s empirical validity. 

As mentioned above, our experimental design has two key different features from the previous in¬ 
vestigation [23. First, we informed subjects that they are playing against a computer while the same 
information was not provided to the subjects in the previous investigation. Second, we provided 500 
rounds of play while the previous study provided 60 rounds of play. Human subjects may need lengthy 
periods of time in order to understand the decision problem or strategic issue they face (see Chapter 1 

in [55]). 

As the previous investigation did not make the nature of the opponent (being a computer or a 
human being) apparent to human subjects [23j , the authors fully ascribed the inconsistency between the 
theoretical predictions and the experimental results to a preference for conditional cooperation I36H38] 
or fairness |39H41| . Existing literature has found that human individuals in general show and acquire a 
stronger sense of fairness when they are playing against other human individuals compared to playing 
against computer programs [421143] . Hilbe et al show that people cooperate more when the opponent 
cooperated in the previous round [22] • Human subjects in their study would defect when they played 
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Payoff ZD strategy 


Figure 4. Theoretical prediction for the expected scores and experimental performance. 

The grey-shaded area depicts the space of possible scores for the ZD strategy implemented by the 
computer programme (x axis) and the human co-player (y axis). The red (green) line corresponds to 
the theoretical prediction for the expected scores of extortion (generosity), respectively, and the open 
red (green) circles indicate the outcome of the ES (GS) treatment, respectively. Each cycle indicates a 
pair of a ZD strategy and a human player. 


against the extortionate strategies, because the extortioner often defected. We also found the propensity 
of conditional cooperation in our experiment where the human subjects knew for sure that they faced a 
computer program. Human subjects’ cooperation rates were higher if the ZD strategists cooperated in 
the previous round than otherwise. On average, human cooperation rates were 78.45% if the extortionate 
strategist cooperated in the previous round, and 60.83% otherwise, and 66.49% if the generous strategist 
cooperated in the previous round, and 41.20% otherwise. However, in our experiment, we noticed that 
even if the opponent defected in the previous round, the cooperation rates were still high, especially in 
the ES treatment. This cannot be explained by conditional cooperation. The requirement for the human 
co-player to earn high scores is precisely to cooperate unconditionally. 

We also observed that the increasing cooperation trend lasted for half of the 500 rounds in both 
treatments. It involves a learning process for the human subjects to realize that cooperation is beneficial. 
In the experiment, a human subject might have found that, for each of the four possible previous outcomes 
CC, CD, DC and DD, she/he had two choices C and D, then she/he could try to figure out in total eight 
conditional scores. Theoretically, the results are shown in Tabic [31 given that a ZD strategist has a unique 
conditional strategy pi, P2 , P3, Pa- For human subjects, if any of the possible outcomes in the previous 
outcome are considered, strategy D forever dominates strategy C in both treatments. This might be the 
reason that the cooperation rates during early stages are relatively lower. However, as time goes on, an 
intelligent and careful player would find that the lowest scores in the first two conditions are higher than 
the highest scores in the second two conditions in both treatments (a theoretical analysis will be provided 
in SI), and the right choice is C. Thus, the cooperation rates followed a gradually increasing process. 
The result also indicates that one may not be able to find evidence to support the ZD theory [T5] if the 
number of rounds of play is not sufficiently large. 

In economics, a rational agent is assumed to maximize her/his own benefit, and the ZD strategies have 
the feature of ensuring that the opponent’s best response is to fully cooperate. While our results suggest 
that the human subjects cared about their earnings as well as fairness or reciprocity, we do observe that 



Table 3. Human’s expected scores of conditional strategies. 


Treatment 

Strategy 

CC 

CD 

DC 

DD 

Pi 

P3 

Pi 

P4 


C 

2.076 

1.614 

0.000 

0.000 

ES 

D 

3.768 

3.152 

1.000 

1.000 


C 

3.000 

3.000 

0.546 

1.092 

GS 

D 

5.000 

5.000 

1.728 

2.456 


CC, CD, DC and DD are the previous outcomes from the human subject’s view. For the details of the 
calculation, see SI. 


subjects learned to become increasingly cooperative over time to increase their own monetary payoffs. 
Evolution has an essential impact on both human subjects and the ZD strategist. As predicted by Press- 
Dyson theory, in our laboratory experiments we have observed human behavior evolving towards full 
cooperation, while the extortionate strategy outperforms generosity and wins by social selection. 


4 Materials and Methods 

4.1 Data source and Experimental setting 

The data we used here come from our laboratory experiments which were conducted at The Experimental 
Social Science Laboratory of Zhejiang University, on 19 and 21 November 2014. A total of 64 undergrad¬ 
uate and graduate students from various disciplines were recruited to participate in the experiment with 
each of them only participating once. In total, we collected 64000 observations of individual decision 
making, consisting of choices of human subjects and ZD strategies implemented by computer programs. 

There were two treatments in total, one was extortionate ZD strategy (ES) and the other was generous 
ZD strategy (GS). For each treatment, there were 32 participants with half of them being male and the 
other half being female. Each experimental session consisted of 4 pairs of human players and computers, 
and in total there were 16 sessions in the set of experiments. On average, each session lasted for 1 hour. 
During the experiment, the player earned scores according to the payoff matrix (see Figure [T]) and their 
choices. After the experiment, the sum of scores were converted to cash according to an exchange rate 
and paid to the subjects. The average earning is about 50 Yuan RMB including the 5 Yuan show-up fee. 

Before the formal experiment, the subjects practiced with a matching pennies game against a computer 
to get acquainted with the laboratory setting. They were then assigned a set of materials including an 
instruction manual (see SI), an informed consent form and a recording chart for their use, and they 
played the game in an small isolated room with a computer. Oral instructions were also given. They 
made decisions by clicking the option C or D on the screen. The software for the experiment was designed 
by the authors. No communication was allowed, and the subjects were asked to put their mobile phones 
in mute and sealed in an envelope until the end of the session. Considering the complexity of the ZD 
strategies, we provided human subjects with paper and pen to record their decision choices and scores 
round by round. The human subjects were told that they would play a game with a fixed computer 
program for more than 500 rounds, and after the 500th round, the game will end after each round 
with a probability of 0.1. This setting allows us to get 500 rounds data for each subject while avoiding 
end-of-game effects [44]. 
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4.2 Statistical methods 

Throughout the paper, we use two-tailed non-parametric tests for our statistical analysis. With each 
iterated game between a human co-player and the computer as our statistical unit, we have 32 independent 
observations for each of the two treatments and each of two players (a computer and a human player). 
Specifically, we used Mann-Whitney test for the comparison between treatments, Wilcoxon signed-rank 
test for the comparison within a treatment. In addition, we use Spearmans rank correlation test for trend 
test. 
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